Art of Assembly: Chapter Six

Art of Assembly Language: Chapter Six

Chapter Six - The 80x86 Instruction Set
6.0 - Chapter Overview
6.1 - The Processor Status Register (Flags)
6.2 - Instruction Encodings
6.3 - Data Movement Instructions
6.3.1 - The MOV Instruction
6.3.2 - The XCHG Instruction
6.3.3 - The LDS, LES, LFS, LGS, and LSS Instructions
6.3.4 - The LEA Instruction
6.3.5 - The PUSH and POP Instructions
6.3.6 - The LAHF and SAHF Instructions
6.4 - Conversions
6.4.1 - The MOVZX, MOVSX, CBW, CWD, CWDE, and CDQ Instructions
6.4.2 - The BSWAP Instruction
6.4.3 - The XLAT Instruction
6.5 - Arithmetic Instructions
6.5.1 - The Addition Instructions: ADD, ADC, INC, XADD, AAA, and DAA
6.5.1.1 - The ADD and ADC Instructions
6.5.1.2 - The INC Instruction
6.5.1.3 - The XADD Instruction
6.5.1.4 - The AAA and DAA Instructions
6.5.2 - The Subtraction Instructions: SUB, SBB, DEC, AAS, and DAS
6.5.3 - The CMP Instruction
6.5.4 - The CMPXCHG, and CMPXCHG8B Instructions
6.5.5 - The NEG Instruction
6.5.6 - The Multiplication Instructions: MUL, IMUL, and AAM
6.5.7 - The Division Instructions: DIV, IDIV, and AAD
6.6 - Logical, Shift, Rotate and Bit Instructions
6.6.1 - The Logical Instructions: AND, OR, XOR, and NOT
6.6.2 - The Shift Instructions: SHL/SAL, SHR, SAR, SHLD, and SHRD
6.6.2.1 - SHL/SAL
6.6.2.2 - SAR
6.6.2.3 - SHR
6.6.2.4 - The SHLD and SHRD Instructions
6.6.3 - The Rotate Instructions: RCL, RCR, ROL, and ROR
6.6.3.1 - RCL
6.6.3.2 - RCR
6.6.3.3 - ROL
6.6.3.4 - ROR
6.6.4 - The Bit Operations
6.6.4.1 - TEST
6.6.4.2 - The Bit Test Instructions: BT, BTS, BTR, and BTC
6.6.4.3 - Bit Scanning: BSF and BSR
6.6.5 - The "Set on Condition" Instructions
6.7 - I/O Instructions
6.8 - String Instructions
6.9 - Program Flow Control Instructions
6.9.1 - Unconditional Jumps
6.9.2 - The CALL and RET Instructions
6.9.3 - The INT, INTO, BOUND, and IRET Instructions
6.9.4 - The Conditional Jump Instructions
6.9.5 - The JCXZ/JECXZ Instructions
6.9.6 - The LOOP Instruction
6.9.7 - The LOOPE/LOOPZ Instruction
6.9.8 - The LOOPNE/LOOPNZ Instruction
6.10 - Miscellaneous Instructions
6.11 - Sample Programs
6.11.1 - Simple Arithmetic I
6.11.2 - Simple Arithmetic II
6.11.3 - Logical Operations
6.11.4 - Shift and Rotate Operations
6.11.5 - Bit Operations and SETcc Instructions
6.11.6 - String Operations
6.11.7 - Conditional Jumps
6.11.8 - CALL and INT Instructions
6.11.9 - Conditional Jumps I
6.11.10 - Conditional Jump Instructions II

Copyright 1996 by Randall Hyde

All rights reserved.

Duplication other than for immediate display through a browser is prohibited by U.S. Copyright Law.

This material is provided on-line as a beta-test of this text. It is for the personal use of the reader only. If you are interested in using this material as part of a course, please contact

rhyde@cs.ucr.edu

Supporting software and other materials are available via anonymous ftp from ftp.cs.ucr.edu. See the "/pub/pc/ibmpcdir" directory for details. You may also download the material from "Randall Hyde's Assembly Language Page" at URL:

http://webster.ucr.edu

Notes:

This document does not contain the laboratory exercises, programming assignments, exercises, or chapter summary. These portions were omitted for several reasons: either they wouldn't format properly, they contained hyperlinks that were too much work to resolve, they were under constant revision, or they were not included for security reasons. Such omission should have very little impact on the reader interested in learning this material or evaluating this document.

This document was prepared using Harlequin's Web Maker 2.2 and Quadralay's Webworks Publisher. Since HTML does not support the rich formatting options available in Framemaker, this document is only an approximation of the actual chapter from the textbook.

If you are absolutely dying to get your hands on a version other than HTML, you might consider having the UCR Printing a Reprographics Department run you off a copy on their Xerox machines. For details, please read the following EMAIL message I received from the Printing and Reprographics Department:

Hello Again Professor Hyde,

Dallas gave me permission to take orders for the Computer Science 13 Manuals. We would need to take charge card orders. The only cards we take are: Master Card, Visa, and Discover. They would need to send the name, numbers, expiration date, type of card, and authorization to charge $95.00 for the manual and shipping, also we should have their phone number in case the company has any trouble delivery. They can use my e-mail address for the orders and I will process them as soon as possible. I would assume that two weeks would be sufficient for printing, packages and delivery time.

I am open to suggestions if you can think of any to make this as easy as possible.

Thank You for your business,

Kathy Chapman, Assistant
Printing and Reprographics
University of California
Riverside
(909) 787-4443/4444

We are currently working on ways to publish this text in a form other than HTML (e.g., Postscript, PDF, Frameviewer, hard copy, etc.). This, however, is a low-priority project. Please do not contact Randall Hyde concerning this effort. When something happens, an announcement will appear on "Randall Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu for the latest scoop.

Art of Assembly Bug Report Submissions

Did you find an error in The Art of Assembly Language Programming? You can let me know by using the form below to report the error to me so that I can correct the error for the next beta version. Thank you.

The Submission Form

Please provide your name and e-mail address so I can contact you if I have any questions regarding your submission.

Chapter Six The 80x86 Instruction Set

Until now, there has been little discussion of the instructions available on the 80x86 microprocessor. This chapter rectifies this situation. Note that this chapter is mainly for reference. It explains what each instruction does, it does not explain how to combine these instructions to form complete assembly language programs. The rest of this book will explain how to do that.

6.0 Chapter Overview

This chapter discusses the 80x86 real mode instruction set. Like any programming language, there are going to be several instructions you use all the time, some you use occasionally, and some you will rarely, if ever, use. This chapter organizes its presentation by instruction class rather than importance. Since beginning assembly language programmers do not have to learn the entire instruction set in order to write meaningful assembly language programs, you will probably not have to learn how every instruction operates. The following list describes the instructions this chapter discusses. A "" symbol marks the important instructions in each group. If you learn only these instructions, you will probably be able to write any assembly language program you want. There are many additional instructions, especially on the 80386 and later processors. These additional instructions make assembly language programming easier, but you do not need to know them to begin writing programs.

80x86 instructions can be (roughly) divided into eight different classes:

1) Data movement instructions

mov, lea, les , push, pop, pushf, popf

2) Conversions

cbw, cwd, xlat

3) Arithmetic instructions

add, inc sub, dec, cmp, neg, mul, imul, div, idiv

4) Logical, shift, rotate, and bit instructions

and, or, xor, not, shl, shr, rcl, rcr

5) I/O instructions

in, out

6) String instructions

movs, stos, lods

7) Program flow control instructions

jmp, call, ret, conditional jumps

8) Miscellaneous instructions.

clc, stc, cmc
The following sections describe all the instructions in these groups and how they operate.

At one time a text such as this one would recommend against using the extended 80386 instruction set. After all, programs that use such instructions will not run properly on 80286 and earlier processors. Using these additional instructions could limit the number of machines your code would run on. However, the 80386 processor is on the verge of disappearing as this text is being written. You can safely assume that most systems will contain an 80386sx or later processor. This text often uses the 80386 instruction set in various example programs. Keep in mind, though, that this is only for convenience. There is no program that appears in this text that could not be recoded using only 8088 assembly language instructions.

A word of advice, particularly to those who learn only the instructions noted above: as you read about the 80x86 instruction set you will discover that the individual 80x86 instructions are not very complex and have simple semantics. However, as you approach the end of this chapter, you may discover that you haven't got a clue how to put these simple instructions together to form a complex program. Fear not, this is a common problem. Later chapters will describe how to form complex programs from these simple instructions.

One quick note: this chapter lists many instructions as "available only on the 80286 and later processors." In fact, many of these instructions were available on the 80186 microprocessor as well. Since few PC systems employ the 80186 microprocessor, this text ignores that CPU. However, to keep the record straight...

6.1 The Processor Status Register (Flags)

The flags register maintains the current operating mode of the CPU and some instruction state information.The figure below shows the layout of the flags register:

The carry, parity, zero, sign, and overflow flags are special because you can test their status (zero or one) with the setcc and conditional jump instructions (see "The "Set on Condition" Instructions" on page 281 and "The Conditional Jump Instructions" on page 296). The 80x86 uses these bits, the condition codes, to make decisions during program execution.

Various arithmetic, logical, and miscellaneous instructions affect the overflow flag. After an arithmetic operation, this flag contains a one if the result does not fit in the signed destination operand. For example, if you attempt to add the 16 bit signed numbers 7FFFh and 0001h the result is too large so the CPU sets the overflow flag. If the result of the arithmetic operation does not produce a signed overflow, then the CPU clears this flag.

Since the logical operations generally apply to unsigned values, the 80x86 logical instructions simply clear the overflow flag. Other 80x86 instructions leave the overflow flag containing an arbitrary value.

The 80x86 string instructions use the direction flag. When the direction flag is clear, the 80x86 processes string elements from low addresses to high addresses; when set, the CPU processes strings in the opposite direction. See "String Instructions" on page 284 for additional details.

The interrupt enable/disable flag controls the 80x86's ability to respond to external events known as interrupt requests. Some programs contain certain instruction sequences that the CPU must not interrupt. The interrupt enable/disable flag turns interrupts on or off to guarantee that the CPU does not interrupt those critical sections of code.

The trace flag enables or disables the 80x86 trace mode. Debuggers (such as CodeView) use this bit to enable or disable the single step/trace operation. When set, the CPU interrupts each instruction and passes control to the debugger software, allowing the debugger to single step through the application. If the trace bit is clear, then the 80x86 executes instructions without the interruption. The 80x86 CPUs do not provide any instructions that directly manipulate this flag. To set or clear the trace flag, you must:

Push the flags onto the 80x86 stack,
Pop the value into another register,
Tweak the trace flag value,
Push the result onto the stack, and then
Pop the flags off the stack.

If the result of some computation is negative, the 80x86 sets the sign flag. You can test this flag after an arithmetic operation to check for a negative result. Remember, a value is negative if its H.O. bit is one. Therefore, operations on unsigned values will set the sign flag if the result has a one in the H.O. position.

Various instructions set the zero flag when they generate a zero result. You'll often use this flag to see if two values are equal (e.g., after subtracting two numbers, they are equal if the result is zero). This flag is also useful after various logical operations to see if a specific bit in a register or memory location contains zero or one.

The auxiliary carry flag supports special binary coded decimal (BCD) operations. Since most programs don't deal with BCD numbers, you'll rarely use this flag and even then you'll not access it directly. The 80x86 CPUs do not provide any instructions that let you directly test, set, or clear this flag. Only the add, adc, sub, sbb, mul, imul, div, idiv, and BCD instructions manipulate this flag.

The parity flag is set according to the parity of the L.O. eight bits of any data operation. If an operation produces an even number of one bits, the CPU sets this flag. It clears this flag if the operation yields an odd number of one bits. This flag is useful in certain data communications programs, however, Intel provided it mainly to provide some compatibility with the older 8080 mP.

The carry flag has several purposes. First, it denotes an unsigned overflow (much like the overflow flag detects a signed overflow). You will also use it during multiprecision arithmetic and logical operations. Certain bit test, set, clear, and invert instructions on the 80386 directly affect this flag. Finally, since you can easily clear, set, invert, and test it, it is useful for various boolean operations. The carry flag has many purposes and knowing when to use it, and for what purpose, can confuse beginning assembly language programmers. Fortunately, for any given instruction, the meaning of the carry flag is clear.

The use of these flags will become readily apparent in the coming sections and chapters. This section is mainly a formal introduction to the individual flags in the register rather than an attempt to explain the exact function of each flag. For more details on the operation of each flag, keep reading...

6.2 Instruction Encodings

The 80x86 uses a binary encoding for each machine operation. While it is important to have a general understanding of how the 80x86 encodes instructions, it is not important that you memorize the encodings for all the instructions in the instruction set. If you were to write an assembler or disassembler (debugger), you would definitely need to know the exact encodings. For general assembly language programming, however, you won't need to know the exact encodings.

However, as you become more experienced with assembly language you will probably want to study the encodings of the 80x86 instruction set. Certainly you should be aware of such terms as opcode, mod-reg-r/m byte, displacement value, and so on. Although you do not need to memorize the parameters for each instruction, it is always a good idea to know the lengths and cycle times for instructions you use regularly since this will help you write better programs. Chapter Three and Chapter Four provided a detailed look at instruction encodings for various instructions (80x86 and x86); such a discussion was important because you do need to understand how the CPU encodes and executes instructions. This chapter does not deal with such details. This chapter presents a higher level view of each instruction and assumes that you don't care how the machine treats bits in memory. For those few times that you will need to know the binary encoding for a particular instruction, a complete listing of the instruction encodings appears in Appendix D.

6.3 Data Movement Instructions

The data movement instructions copy values from one location to another. These instructions include mov, xchg, lds, lea, les, lfs, lgs, lss, push, pusha, pushad, pushf, pushfd, pop, popa, popad, popf, popfd, lahf, and sahf.

6.3.1 The MOV Instruction

The mov instruction takes several different forms:

                mov     reg, reg
                mov     mem, reg
                mov     reg, mem
                mov     mem, immediate data
                mov     reg, immediate data
                mov     ax/al, mem 
                mov     mem, ax/al 
                mov     segreg, mem16
                mov     segreg, reg16
                mov     mem16, segreg
                mov     reg16, segreg

The last chapter discussed the mov instruction in detail, only a few minor comments are worthwhile here. First, there are variations of the mov instruction that are faster and shorter than other mov instructions that do the same job. For example, both the mov ax, mem and mov reg, mem instructions can load the ax register from a memory location. On all processors the first version is shorter. On the earlier members of the 80x86 family, it is faster as well.

There are two very important details to note about the mov instruction. First, there is no memory to memory move operation. The mod-reg-r/m addressing mode byte (see Chapter Four) allows two register operands or a single register and a single memory operand. There is no form of the mov instruction that allows you to encode two memory addresses into the same instruction. Second, you cannot move immediate data into a segment register. The only instructions that move data into or out of a segment register have mod-reg-r/m bytes associated with them; there is no format that moves an immediate value into a segment register. Two common errors beginning programmers make are attempting a memory to memory move and trying to load a segment register with a constant.

The operands to the mov instruction may be bytes, words, or double words. Both operands must be the same size or MASM will generate an error while assembling your program. This applies to memory operands and register operands. If you declare a variable, B, using byte and attempt to load this variable into the ax register, MASM will complain about a type conflict.

The CPU extends immediate data to the size of the destination operand (unless it is too big to fit in the destination operand, which is an error). Note that you can move an immediate value into a memory location. The same rules concerning size apply. However, MASM cannot determine the size of certain memory operands. For example, does the instruction mov [bx], 0 store an eight bit, sixteen bit, or thirty-two bit value? MASM cannot tell, so it reports an error. This problem does not exist when you move an immediate value into a variable you've declared in your program. For example, if you've declared B as a byte variable, MASM knows to store an eight bit zero into B for the instruction mov B, 0. Only those memory operands involving pointers with no variable operands suffer from this problem. The solution is to explicitly tell MASM whether the operand is a byte, word, or double word. You can accomplish this with the following instruction forms:

                mov     byte ptr [bx], 0
                mov     word ptr [bx], 0
                mov     dword ptr [bx], 0       (3)

(3) Available only on 80386 and later processors

For more details on the type ptr operator, see Chapter Eight.

Moves to and from segment registers are always 16 bits; the mod-reg-r/m operand must be 16 bits or MASM will generate an error. Since you cannot load a constant directly into a segment register, a common solution is to load the constant into an 80x86 general purpose register and then copy it to the segment register. For example, the following two instruction sequence loads the es register with the value 40h:

                mov     ax, 40h
                mov     es, ax

Note that almost any general purpose register would suffice. Here, ax was chosen arbitrarily.

The mov instructions do not affect any flags. In particular, the 80x86 preserves the flag values across the execution of a mov instruction.

6.3.2 The XCHG Instruction

The xchg (exchange) instruction swaps two values. The general form is

                xchg    operand1, operand2

There are four specific forms of this instruction on the 80x86:

                xchg    reg, mem
                xchg    reg, reg
                xchg    ax, reg16
                xchg    eax, reg32              (3)

(3) Available only on 80386 and later processors

The first two general forms require two or more bytes for the opcode and mod-reg-r/m bytes (a displacement, if necessary, requires additional bytes). The third and fourth forms are special forms of the second that exchange data in the (e)ax register with another 16 or 32 bit register. The 16 bit form uses a single byte opcode that is shorter than the other two forms that use a one byte opcode and a mod-reg-r/m byte.

Already you should note a pattern developing: the 80x86 family often provides shorter and faster versions of instructions that use the ax register. Therefore, you should try to arrange your computations so that they use the (e)ax register as much as possible. The xchg instruction is a perfect example, the form that exchanges 16 bit registers is only one byte long.

Note that the order of the xchg's operands does not matter. That is, you could enter xchg mem, reg and get the same result as xchg reg, mem. Most modern assemblers will automatically emit the opcode for the shorter xchg ax, reg instruction if you specify xchg reg, ax.

Both operands must be the same size. On pre-80386 processors the operands may be eight or sixteen bits. On 80386 and later processors the operands may be 32 bits long as well.

The xchg instruction does not modify any flags.

6.3.3 The LDS, LES, LFS, LGS, and LSS Instructions

The lds, les, lfs, lgs, and lss instructions let you load a 16 bit general purpose register and segment register pair with a single instruction. On the 80286 and earlier, the lds and les instructions are the only instructions that directly process values larger than 32 bits. The general form is

                LxS     dest, source

These instructions take the specific forms:

                lds     reg16, mem32
                les     reg16, mem32
                lfs     reg16, mem32    (3)
                lgs     reg16, mem32    (3)
                lss     reg16, mem32    (3)

(3) Available only on 80386 and later processors

Reg16 is any general purpose 16 bit register and mem32 is a double word memory location (declared with the dword statement).

These instructions will load the 32 bit double word at the address specified by mem32 into reg16 and the ds, es, fs, gs, or ss registers. They load the general purpose register from the L.O. word of the memory operand and the segment register from the H.O. word. The following algorithms describe the exact operation:

lds reg16, mem32:
                reg16 := [mem32]
                ds := [mem32 + 2]
les reg16, mem32:
                reg16 := [mem32]
                es := [mem32 + 2]
lfs reg16, mem32:
                reg16 := [mem32]
                fs := [mem32 + 2]
lgs reg16, mem32:
                reg16 := [mem32]
                gs := [mem32 + 2]
lss reg16, mem32:
                reg16 := [mem32]
                ss := [mem32 + 2]

Since the LxS instructions load the 80x86's segment registers, you must not use these instructions for arbitrary purposes. Use them to set up (far) pointers to certain data objects as discussed in Chapter Four. Any other use may cause problems with your code if you attempt to port it to Windows, OS/2 or UNIX.

Keep in mind that these instructions load the four bytes at a given memory location into the register pair; they do not load the address of a variable into the register pair (i.e., this instruction does not have an immediate mode). To learn how to load the address of a variable into a register pair, see Chapter Eight.

The LxS instructions do not affect any of the 80x86's flag bits.

6.3.4 The LEA Instruction

The lea (Load Effective Address) instruction is another instruction used to prepare pointer values. The lea instruction takes the form:

                lea     dest, source

The specific forms on the 80x86 are

                lea     reg16, mem
                lea     reg32, mem      (3)

(3) Available only on 80386 and later processors.

It loads the specified 16 or 32 bit general purpose register with the effective address of the specified memory location. The effective address is the final memory address obtained after all addressing mode computations. For example, lea ax, ds:[1234h] loads the ax register with the address of memory location 1234h; here it just loads the ax register with the value 1234h. If you think about it for a moment, this isn't a very exciting operation. After all, the mov ax, immediate_data instruction can do this. So why bother with the lea instruction at all? Well, there are many other forms of a memory operand besides displacement-only operands. Consider the following lea instructions:

                lea     ax, [bx]
                lea     bx, 3[bx]
                lea     ax, 3[bx]
                lea     bx, 4[bp+si]
                lea     ax, -123[di]

The lea ax, [bx] instruction copies the address of the expression [bx] into the ax register. Since the effective address is the value in the bx register, this instruction copies bx's value into the ax register. Again, this instruction isn't very interesting because mov can do the same thing, even faster.

The lea bx,3[bx] instruction copies the effective address of 3[bx] into the bx register. Since this effective address is equal to the current value of bx plus three, this lea instruction effectively adds three to the bx register. There is an add instruction that will let you add three to the bx register, so again, the lea instruction is superfluous for this purpose.

The third lea instruction above shows where lea really begins to shine. lea ax, 3[bx] copies the address of the memory location 3[bx] into the ax register; i.e., it adds three with the value in the bx register and moves the sum into ax. This is an excellent example of how you can use the lea instruction to do a mov operation and an addition with a single instruction.

The final two instructions above, lea bx,4[bp+si] and lea ax,-123[di] provide additional examples of lea instructions that are more efficient than their mov/add counterparts.

On the 80386 and later processors, you can use the scaled indexed addressing modes to multiply by two, four, or eight as well as add registers and displacements together. Intel strongly suggests the use of the lea instruction since it is much faster than a sequence of instructions computing the same result.

The (real) purpose of lea is to load a register with a memory address. For example, lea bx, 128[bp+di] sets up bx with the address of the byte referenced by 128[BP+DI]. As it turns out, an instruction of the form mov al,[bx] runs faster than an instruction of the form mov al,128[bp+di]. If this instruction executes several times, it is probably more efficient to load the effective address of 128[bp+di] into the bx register and use the [bx] addressing mode. This is a common optimization in high performance programs.

The lea instruction does not affect any of the 80x86's flag bits.

6.3.5 The PUSH and POP Instructions

The 80x86 push and pop instructions manipulate data on the 80x86's hardware stack. There are 19 varieties of the push and pop instructions, they are

                push    reg16
                pop     reg16
                push    reg32           (3)
                pop     reg32           (3)
                push    segreg
                pop     segreg          (except CS)
                push    memory
                pop     memory
                push    immediate_data  (2)
                pusha                   (2)
                popa                    (2)
                pushad                  (3)
                popad                   (3)
                pushf
                popf
                pushfd                  (3)
                popfd                   (3)
                enter   imm, imm        (2)
                leave                   (2)

(2)- Available only on 80286 and later processors.
(3)- Available only on 80386 and later processors.

The first two instructions push and pop a 16 bit general purpose register. This is a compact (one byte) version designed specifically for registers. Note that there is a second form that provides a mod-reg-r/m byte that could push registers as well; most assemblers only use that form for pushing the value of a memory location.

The second pair of instructions push or pop an 80386 32 bit general purpose register. This is really nothing more than the push register instruction described in the previous paragraph with a size prefix byte.

The third pair of push/pop instructions let you push or pop an 80x86 segment register. Note that the instructions that push fs and gs are longer than those that push cs, ds, es, and ss, see Appendix D for the exact details. You can only push the cs register (popping the cs register would create some interesting program flow control problems).

The fourth pair of push/pop instructions allow you to push or pop the contents of a memory location. On the 80286 and earlier, this must be a 16 bit value. For memory operations without an explicit type (e.g., [bx]) you must either use the pushw mnemonic or explicitly state the size using an instruction like push word ptr [bx]. On the 80386 and later you can push and pop 16 or 32 bit values. You can use dword memory operands, you can use the pushd mnemonic, or you can use the dword ptr operator to force 32 bit operation. Examples:

                push    DblWordVar
                push    dword ptr [bx]
                pushd   dword

The pusha and popa instructions (available on the 80286 and later) push and pop all the 80x86 16 bit general purpose registers. Pusha pushes the registers in the following order: ax, cx, dx, bx, sp, bp, si, and then di. Popa pops these registers in the reverse order. Pushad and Popad (available on the 80386 and later) do the same thing on the 80386's 32 bit register set. Note that these "push all" and "pop all" instructions do not push or pop the flags or segment registers.

The pushf and popf instructions allow you to push/pop the processor status register (the flags). Note that these two instructions provide a mechanism to modify the 80x86's trace flag. See the description of this process earlier in this chapter. Of course, you can set and clear the other flags in this fashion as well. However, most of the other flags you'll want to modify (specifically, the condition codes) provide specific instructions or other simple sequences for this purpose.

Enter and leave push/pop the bp register and allocate storage for local variables on the stack. You will see more on these instructions in a later chapter. This chapter does not consider them since they are not particularly useful outside the context of procedure entry and exit.

"So what do these instructions do?" you're probably asking by now. The push instructions move data onto the 80x86 hardware stack and the pop instructions move data from the stack to memory or to a register. The following is an algorithmic description of each instruction:

push instructions (16 bits):

	SP := SP - 2
	[SS:SP] := 16 bit operand (store result at location SS:SP.)

pop instructions (16 bits): 
	16-bit operand := [SS:SP]
	SP := SP + 2

push instructions (32 bits): 
	SP := SP - 4
	[SS:SP] := 32 bit operand

pop instructions (32 bits): 
	32 bit operand := [SS:SP]
	SP := SP + 4

You can treat the pusha/pushad and popa/popad instructions as equivalent to the corresponding sequence of 16 or 32 bit push/pop operations (e.g., push ax, push cx, push dx, push bx, etc.).

Notice three things about the 80x86 hardware stack. First, it is always in the stack segment (wherever ss points). Second, the stack grows down in memory. That is, as you push values onto the stack the CPU stores them into successively lower memory locations. Finally, the 80x86 hardware stack pointer (ss:sp) always contains the address of the value on the top of the stack (the last value pushed on the stack).

You can use the 80x86 hardware stack for temporarily saving registers and variables, passing parameters to a procedure, allocating storage for local variables, and other uses. The push and pop instructions are extremely valuable for manipulating these items on the stack. You'll get a chance to see how to use them later in this text.

Most of the push and pop instructions do not affect any of the flags in the 80x86 processor status register. The popf/popfd instructions, by their very nature, can modify all the flag bits in the 80x86 processor status register (flags register). Pushf and pushfd push the flags onto the stack, but they do not change any flags while doing so.

All pushes and pops are 16 or 32 bit operations. There is no (easy) way to push a single eight bit value onto the stack. To push an eight bit value you would need to load it into the H.O. byte of a 16 bit register, push that register, and then add one to the stack pointer. On all processors except the 8088, this would slow future stack access since sp now contains an odd address, misaligning any further pushes and pops. Therefore, most programs push or pop 16 bits, even when dealing with eight bit values.

Although it is relatively safe to push an eight bit memory variable, be careful when popping the stack to an eight bit memory location. Pushing an eight bit variable with push word ptr ByteVar pushes two bytes, the byte in the variable ByteVar and the byte immediately following it. Your code can simply ignore the extra byte this instruction pushes onto the stack. Popping such values is not quite so straight forward. Generally, it doesn't hurt if you push these two bytes. However, it can be a disaster if you pop a value and wipe out the following byte in memory. There are only two solutions to this problem. First, you could pop the 16 bit value into a register like ax and then store the L.O. byte of that register into the byte variable. The second solution is to reserve an extra byte of padding after the byte variable to hold the whole word you will pop. Most programs use the former approach.

6.3.6 The LAHF and SAHF Instructions

The lahf (load ah from flags) and sahf (store ah into flags) instructions are archaic instructions included in the 80x86's instruction set to help improve compatibility with Intel's older 8080 mP chip. As such, these instructions have very little use in modern day 80x86 programs. The lahf instruction does not affect any of the flag bits. The sahf instruction, by its very nature, modifies the S, Z, A, P, and C bits in the processor status register. These instructions do not require any operands and you use them in the following manner:

		sahf
		lahf

Sahf only affects the L.O. eight bits of the flags register. Likewise, lahf only loads the L.O. eight bits of the flags register into the AH register. These instructions do not deal with the overflow, direction, interrupt disable, or trace flags. The fact that these instructions do not deal with the overflow flag is an important limitation.

Sahf has one major use. When using a floating point processor (8087, 80287, 80387, 80486, Pentium, etc.) you can use the sahf instruction to copy the floating point status register flags into the 80x86's flag register. You'll see this use in the chapter on floating point arithmetic (See Chapter Fourteen).

6.4 Conversions

The 80x86 instruction set provides several conversion instructions. They include movzx, movsx, cbw, cwd, cwde, cdq, bswap, and xlat. Most of these instructions sign or zero extend values, the last two convert between storage formats and translate values via a lookup table. These instructions take the general form:

                movzx   dest, src       ;Dest must be twice the size of src.
                movsx   dest, src       ;Dest must be twice the size of src.
                cbw
                cwd
                cwde
                cdq
                bswap   reg32
                xlat                    ;Special form allows an operand.

6.4.1 The MOVZX, MOVSX, CBW, CWD, CWDE, and CDQ Instructions

These instructions zero and sign extend values. The cbw and cwd instructions are available on all 80x86 processors. The movzx, movsx, cwde, and cdq instructions are available only on 80386 and later processors.

The cbw (convert byte to word) instruction sign extends the eight bit value in al to ax. That is, it copies bit seven of AL throughout bits 8-15 of ax. This instruction is especially important before executing an eight bit division (as you'll see in the section "Arithmetic Instructions" on page 255). This instruction requires no operands and you use it as follows:

cbw

The cwd (convert word to double word) instruction sign extends the 16 bit value in ax to 32 bits and places the result in dx:ax. It copies bit 15 of ax throughout the bits in dx. It is available on all 80x86 processors which explains why it doesn't sign extend the value into eax. Like the cbw instruction, this instruction is very important for division operations. Cwd requires no operands and you use it as follows

cwd

The cwde instruction sign extends the 16 bit value in ax to 32 bits and places the result in eax by copying bit 15 of ax throughout bits 16..31 of eax. This instruction is available only on the 80386 and later processors. As with cbw and cwd the instruction has no operands and you use it as follows:

		cwde

The cdq instruction sign extends the 32 bit value in eax to 64 bits and places the result in edx:eax by copying bit 31 of eax throughout bits 0..31 of edx. This instruction is available only on the 80386 and later. You would normally use this instruction before a long division operation. As with cbw, cwd, and cwde the instruction has no operands and you use it as follows:

cdq

If you want to sign extend an eight bit value to 32 or 64 bits using these instructions, you could use sequences like the following:

; Sign extend al to dx:ax

                cbw
                cwd

; Sign extend al to eax

                cbw
                cwde

; Sign extend al to edx:eax

                cbw
                cwde
                cdq

You can also use the movsx for sign extensions from eight to sixteen or thirty-two bits.

The movsx instruction is a generalized form of the

cbw,
cwd,

and cwde instructions. It will sign extend an eight bit value to a sixteen or thirty-two bits, or sign extend a sixteen bit value to a thirty-two bits. This instruction uses a mod-reg-r/m byte to specify the two operands. The allowable forms for this instruction are

                movsx   reg16, mem8
                movsx   reg16, reg8
                movsx   reg32, mem8
                movsx   reg32, reg8
                movsx   reg32, mem16
                movsx   reg32, reg16

Note that anything you can do with the cbw and cwde instructions, you can do with a movsx instruction:

                movsx   ax, al          ;CBW
                movsx   eax, ax         ;CWDE
                movsx   eax, al         ;CBW followed by CWDE

However, the cbw and cwde instructions are shorter and sometimes faster. This instruction is available only on the 80386 and later processors. Note that there are not direct movsx equivalents for the cwd and cdq instructions.

The movzx instruction works just like the movsx instruction, except it extends unsigned values via zero extension rather than signed values through sign extension. The syntax is the same as for the movsx instructions except, of course, you use the movzx mnemonic rather than movsx.

Note that if you want to zero extend an eight bit register to 16 bits (e.g., al to ax) a simple mov instruction is faster and shorter than movzx. For example,

		mov	bh, 0

is faster and shorter than

		movzx	bx, bl

Of course, if you move the data to a different 16 bit register (e.g.,

movzx
bx, al

) the movzx instruction is better.

Like the movsx instruction, the movzx instruction is available only on 80386 and later processors. The sign and zero extension instructions do not affect any flags.

6.4.2 The BSWAP Instruction

The bswap instruction, available only on 80486 (yes, 486) and later processors, converts between 32 bit little endian and big endian values. This instruction accepts only a single 32 bit register operand. It swaps the first byte with the fourth and the second byte with the third. The syntax for the instruction is

		bswap	reg32

where reg32 is an 80486 32 bit general purpose register.

The Intel processor families use a memory organization known as little endian byte organization. In little endian byte organization, the L.O. byte of a multi-byte sequence appears at the lowest address in memory. For example, bits zero through seven of a 32 bit value appear at the lowest address; bits eight through fifteen appear at the second address in memory; bits 16 through 23 appear in the third byte, and bits 24 through 31 appear in the fourth byte.

Another popular memory organization is big endian. In the big endian scheme, bits twenty-four through thirty-one appear in the first (lowest) address, bits sixteen through twenty-three appear in the second byte, bits eight through fifteen appear in the third byte, and bits zero through seven appear in the fourth byte. CPUs such as the Motorola 68000 family used by Apple in their Macintosh computer and many RISC chips employ the big endian scheme.

Normally, you wouldn't care about byte organization in memory since programs written for an Intel processor in assembly language do not run on a 68000 processor. However, it is very common to exchange data between machines with different byte organizations. Unfortunately, 16 and 32 bit values on big endian machines do not produce correct results when you use them on little endian machines. This is where the bswap instruction comes in. It lets you easily convert 32 bit big endian values to 32 bit little endian values.

One interesting use of the bswap instruction is to provide access to a second set of 16 bit general purpose registers. If you are using only 16 bit registers in your code, you can double the number of available registers by using the bswap instruction to exchange the data in a 16 bit register with the H.O. word of a thirty-two bit register. For example, you can keep two 16 bit values in eax and move the appropriate value into ax as follows:

        < Some computations that leave a result in AX >

                bswap   eax

        < Some additional computations involving AX >

                bswap   eax

        < Some computations involving the original value in AX >

                bswap   eax

        < Computations involving the 2nd copy of AX from above >

You can use this technique on the 80486 to obtain two copies of

ax,
bx, cx, dx, si, di,

and bp. You must exercise extreme caution if you use this technique with the sp register.

Note: to convert 16 bit big endian values to 16 bit little endian values just use the 80x86 xchg instruction. For example, if ax contains a 16 bit big endian value, you can convert it to a 16 bit little endian value (or vice versa) using:

		xchg	al, ah

The bswap instruction does not affect any flags in the 80x86 flags register.

6.4.3 The XLAT Instruction

The xlat instruction translates the value in the al register based on a lookup table in memory. It does the following:

		temp := al+bx
		al := ds:[temp]

that is, bx points at a table in the current data segment. Xlat replaces the value in al with the byte at the offset originally in al. If al contains four, xlat replaces the value in al with the fifth item (offset four) within the table pointed at by ds:bx. The xlat instruction takes the form:

		xlat

Typically it has no operand. You can specify one but the assembler virtually ignores it. The only purpose for specifying an operand is so you can provide a segment override prefix:

		xlat	es:Table

This tells the assembler to emit an es: segment prefix byte before the instruction. You must still load bx with the address of Table; the form above does not provide the address of Table to the instruction. Only the segment override prefix in the operand is significant.

The xlat instruction does not affect the 80x86's flags register.

6.0 - Chapter Overview
6.1 - The Processor Status Register (Flags)
6.2 - Instruction Encodings
6.3 - Data Movement Instructions
6.3.1 - The MOV Instruction
6.3.2 - The XCHG Instruction
6.3.3 - The LDS, LES, LFS, LGS, and LSS Instructions
6.3.4 - The LEA Instruction
6.3.5 - The PUSH and POP Instructions
6.3.6 - The LAHF and SAHF Instructions
6.4 - Conversions
6.4.1 - The MOVZX, MOVSX, CBW, CWD, CWDE, and CDQ Instructions
6.4.2 - The BSWAP Instruction
6.4.3 - The XLAT Instruction
6.5 - Arithmetic Instructions
6.5.1 - The Addition Instructions: ADD, ADC, INC, XADD, AAA, and DAA
6.5.1.1 - The ADD and ADC Instructions
6.5.1.2 - The INC Instruction
6.5.1.3 - The XADD Instruction
6.5.1.4 - The AAA and DAA Instructions
6.5.2 - The Subtraction Instructions: SUB, SBB, DEC, AAS, and DAS
6.5.3 - The CMP Instruction
6.5.4 - The CMPXCHG, and CMPXCHG8B Instructions
6.5.5 - The NEG Instruction
6.5.6 - The Multiplication Instructions: MUL, IMUL, and AAM
6.5.7 - The Division Instructions: DIV, IDIV, and AAD
6.6 - Logical, Shift, Rotate and Bit Instructions
6.6.1 - The Logical Instructions: AND, OR, XOR, and NOT
6.6.2 - The Shift Instructions: SHL/SAL, SHR, SAR, SHLD, and SHRD
6.6.2.1 - SHL/SAL
6.6.2.2 - SAR
6.6.2.3 - SHR
6.6.2.4 - The SHLD and SHRD Instructions
6.6.3 - The Rotate Instructions: RCL, RCR, ROL, and ROR
6.6.3.1 - RCL
6.6.3.2 - RCR
6.6.3.3 - ROL
6.6.3.4 - ROR
6.6.4 - The Bit Operations
6.6.4.1 - TEST
6.6.4.2 - The Bit Test Instructions: BT, BTS, BTR, and BTC
6.6.4.3 - Bit Scanning: BSF and BSR
6.6.5 - The "Set on Condition" Instructions
6.7 - I/O Instructions
6.8 - String Instructions
6.9 - Program Flow Control Instructions
6.9.1 - Unconditional Jumps
6.9.2 - The CALL and RET Instructions
6.9.3 - The INT, INTO, BOUND, and IRET Instructions
6.9.4 - The Conditional Jump Instructions
6.9.5 - The JCXZ/JECXZ Instructions
6.9.6 - The LOOP Instruction
6.9.7 - The LOOPE/LOOPZ Instruction
6.9.8 - The LOOPNE/LOOPNZ Instruction
6.10 - Miscellaneous Instructions
6.11 - Sample Programs
6.11.1 - Simple Arithmetic I
6.11.2 - Simple Arithmetic II
6.11.3 - Logical Operations
6.11.4 - Shift and Rotate Operations
6.11.5 - Bit Operations and SETcc Instructions
6.11.6 - String Operations
6.11.7 - Conditional Jumps
6.11.8 - CALL and INT Instructions
6.11.9 - Conditional Jumps I
6.11.10 - Conditional Jump Instructions II

Art of Assembly: Chapter Six - 26 SEP 1996

[Next] [Art of Assembly][Randall Hyde]